Chris Pollett > Old Classses > CS267
( Print View )

Student Corner:
  [Submit Sec1]
  [Grades Sec1]

  [Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Description]
  [Course Outcomes]
  [Outcomes Matrix]
  [Course Schedule]
  [Grading]
  [Requirements/HW/Quizzes]
  [Class Protocols]
  [Exam Info]
  [Regrades]
  [University Policies]
  [Announcements]

HW Assignments:
  [Hw1] [Hw2] [Hw3]
  [Hw4] [Hw5] [Quizzes]

Practice Exams:
  [Mid1]  [Mid2]   [Final]

CS267 Fall 2018Practice Final

To study for the final I would suggest you: (1) Know how to do (by heart) all the practice problems. (2) Go over your notes at least three times. Second and third time try to see how much you can remember from the first time. (3) Go over the homework problems. (4) Try to create your own problems similar to the ones I have given and solve them. (5) Skim the relevant sections from the book. (6) If you want to study in groups, at this point you are ready to quiz each other. The practice final is below. Here are some facts about the actual final: (a) It is comprehensive (b) It is closed book, closed notes. Nothing will be permitted on your desk except your pen (pencil) and test. (c) You should bring photo ID. (d) There will be more than one version of the test. Each version will be of comparable difficulty. (e) It is 10 problems (2pts each), 6 problems will be on materials since the second midterm, 4 problems will be from the topics of the midterm. (f) Two problems will be exactly (less typos) off of the practice final, and one will be off of the practice midterm.

Explain in English what the `P_1` and `P_2` components used to derive the DFR formula mean. Explain how `P_2` is calculated.
Explain how document length normalization is incorporated into the basic DFR formula.
Define the terms and give an example (a) document partitioning, (b) term partitioning.
Suppose we are computing queries using a parallel document partitioned set-up with 2 servers each returning 2 results, where we want the top 3 results. What are the odds that we succeed? (Work out using the recursive formula.)
Explain how query processing can be done in the term partitioning set-up and where one can have intra-query parallelism.
Give a map reduce algoritm which takes as inputs (docid, documnent) tuples and output (term_id, num_occurrences_term*num_occurrences_term) pairs.
Briefly explain how the OPIC Document Quality Measure works.
Define and explain the dangling node matrix and teleporter matrix components of the Google matrix.
Give concrete examples for how the matrices used in HITS and SALSA differ.
Describe where/in what steps a distributed file system might be used in the computation of Page rank as a map reduce job.